Goto

Collaborating Authors

 robot system


Robix: A Unified Model for Robot Interaction, Reasoning and Planning

Fang, Huang, Zhang, Mengxi, Dong, Heng, Li, Wei, Wang, Zixuan, Zhang, Qifeng, Tian, Xueyun, Hu, Yucheng, Li, Hang

arXiv.org Artificial Intelligence

We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex instructions, plan long-horizon tasks, and interact naturally with human within an end-to-end framework. Robix further introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix leverages chain-of-thought reasoning and adopts a three-stage training strategy: (1) continued pretraining to enhance foundational embodied reasoning abilities including 3D spatial understanding, visual grounding, and task-centric reasoning; (2) supervised finetuning to model human-robot interaction and task planning as a unified reasoning-action sequence; and (3) reinforcement learning to improve reasoning-action consistency and long-horizon task coherence. Extensive experiments demonstrate that Robix outperforms both open-source and commercial baselines (e.g., GPT-4o and Gemini 2.5 Pro) in interactive task execution, demonstrating strong generalization across diverse instruction types (e.g., open-ended, multi-stage, constrained, invalid, and interrupted) and various user-involved tasks such as table bussing, grocery shopping, and dietary filtering.


Enabling Generic Robot Skill Implementation Using Object Oriented Programming

Farrukh, Abdullah, Wagner, Achim, Ruskowski, Martin

arXiv.org Artificial Intelligence

Developing robotic algorithms and integrating a robotic subsystem into a larger system can be a difficult task. Particularly in small and medium-sized enterprises (SMEs) where robotics expertise is lacking, implementing, maintaining and developing robotic systems can be a challenge. As a result, many companies rely on external expertise through system integrators, which, in some cases, can lead to vendor lock-in and external dependency. In the academic research on intelligent manufacturing systems, robots play a critical role in the design of robust autonomous systems. Similar challenges are faced by researchers who want to use robotic systems as a component in a larger smart system, without having to deal with the complexity and vastness of the robot interfaces in detail. In this paper, we propose a software framework that reduces the effort required to deploy a working robotic system. The focus is solely on providing a concept for simplifying the different interfaces of a modern robot system and using an abstraction layer for different manufacturers and models. The Python programming language is used to implement a prototype of the concept. The target system is a bin-picking cell containing a Yaskawa Motoman GP4.


A Deep Reinforcement Learning Environment for Particle Robot Navigation and Object Manipulation

Shen, Jeremy, Xiao, Erdong, Liu, Yuchen, Feng, Chen

arXiv.org Artificial Intelligence

Particle robots are novel biologically-inspired robotic systems where locomotion can be achieved collectively and robustly, but not independently. While its control is currently limited to a hand-crafted policy for basic locomotion tasks, such a multi-robot system could be potentially controlled via Deep Reinforcement Learning (DRL) for different tasks more efficiently. However, the particle robot system presents a new set of challenges for DRL differing from existing swarm robotics systems: the low degrees of freedom of each robot and the increased necessity of coordination between robots. We present a 2D particle robot simulator using the OpenAI Gym interface and Pymunk as the physics engine, and introduce new tasks and challenges to research the underexplored applications of DRL in the particle robot system. Moreover, we use Stable-baselines3 to provide a set of benchmarks for the tasks. Current baseline DRL algorithms show signs of achieving the tasks but are yet unable to reach the performance of the hand-crafted policy. Further development of DRL algorithms is necessary in order to accomplish the proposed tasks.


LEGATO: Cross-Embodiment Imitation Using a Grasping Tool

Seo, Mingyo, Park, H. Andy, Yuan, Shenli, Zhu, Yuke, Sentis, Luis

arXiv.org Artificial Intelligence

Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. Using this gripper, we train visuomotor policies via imitation learning, applying a motion-invariant transformation to compute the training loss. Gripper motions are then retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework's effectiveness in learning and transferring visuomotor skills across various robots. More information can be found at the project page: https://ut-hcrl.github.io/LEGATO.


Active Vision Might Be All You Need: Exploring Active Vision in Bimanual Robotic Manipulation

Chuang, Ian, Lee, Andrew, Gao, Dechen, Soltani, Iman

arXiv.org Artificial Intelligence

Imitation learning has demonstrated significant potential in performing high-precision manipulation tasks using visual feedback from cameras. However, it is common practice in imitation learning for cameras to be fixed in place, resulting in issues like occlusion and limited field of view. Furthermore, cameras are often placed in broad, general locations, without an effective viewpoint specific to the robot's task. In this work, we investigate the utility of active vision (AV) for imitation learning and manipulation, in which, in addition to the manipulation policy, the robot learns an AV policy from human demonstrations to dynamically change the robot's camera viewpoint to obtain better information about its environment and the given task. We introduce AV-ALOHA, a new bimanual teleoperation robot system with AV, an extension of the ALOHA 2 robot system, incorporating an additional 7-DoF robot arm that only carries a stereo camera and is solely tasked with finding the best viewpoint. This camera streams stereo video to an operator wearing a virtual reality (VR) headset, allowing the operator to control the camera pose using head and body movements. The system provides an immersive teleoperation experience, with bimanual first-person control, enabling the operator to dynamically explore and search the scene and simultaneously interact with the environment. We conduct imitation learning experiments of our system both in real-world and in simulation, across a variety of tasks that emphasize viewpoint planning. Our results demonstrate the effectiveness of human-guided AV for imitation learning, showing significant improvements over fixed cameras in tasks with limited visibility. Project website: https://soltanilara.github.io/av-aloha/


Interpreting and learning voice commands with a Large Language Model for a robot system

Stankevich, Stanislau, Dudek, Wojciech

arXiv.org Artificial Intelligence

Robots are increasingly common in both industry and daily life, such as in nursing homes where they can assist staff. A key challenge is developing intuitive interfaces for easy communication. The use of Large Language Models (LLMs) like GPT-4 has enhanced robot capabilities, allowing for real-time interaction and decision-making. This integration improves robots' adaptability and functionality. This project focuses on merging LLMs with databases to improve decision-making and enable knowledge acquisition for the request interpretation problems.


Robot Agnostic Visual Servoing considering kinematic constraints enabled by a decoupled network trajectory planner structure

Schempp, Constantin, Friedrich, Christian

arXiv.org Artificial Intelligence

We propose a visual servoing method consisting of a detection network and a velocity trajectory planner. First, the detection network estimates the objects position and orientation in the image space. Furthermore, these are normalized and filtered. The direction and orientation is then the input to the trajectory planner, which considers the kinematic constrains of the used robotic system. This allows safe and stable control, since the kinematic boundary values are taken into account in planning. Also, by having direction estimation and velocity planner separated, the learning part of the method does not directly influence the control value. This also enables the transfer of the method to different robotic systems without retraining, therefore being robot agnostic. We evaluate our method on different visual servoing tasks with and without clutter on two different robotic systems. Our method achieved mean absolute position errors of <0.5 mm and orientation errors of <1{\deg}. Additionally, we transferred the method to a new system which differs in robot and camera, emphasizing robot agnostic capability of our method.


Physics-Based Causal Reasoning for Safe & Robust Next-Best Action Selection in Robot Manipulation Tasks

Cannizzaro, Ricardo, Groom, Michael, Routley, Jonathan, Ness, Robert Osazuwa, Kunze, Lars

arXiv.org Artificial Intelligence

Safe and efficient object manipulation is a key enabler of many real-world robot applications. However, this is challenging because robot operation must be robust to a range of sensor and actuator uncertainties. In this paper, we present a physics-informed causal-inference-based framework for a robot to probabilistically reason about candidate actions in a block stacking task in a partially observable setting. We integrate a physics-based simulation of the rigid-body system dynamics with a causal Bayesian network (CBN) formulation to define a causal generative probabilistic model of the robot decision-making process. Using simulation-based Monte Carlo experiments, we demonstrate our framework's ability to successfully: (1) predict block tower stability with high accuracy (Pred Acc: 88.6%); and, (2) select an approximate next-best action for the block stacking task, for execution by an integrated robot system, achieving 94.2% task success rate. We also demonstrate our framework's suitability for real-world robot systems by demonstrating successful task executions with a domestic support robot, with perception and manipulation sub-system integration. Hence, we show that by embedding physics-based causal reasoning into robots' decision-making processes, we can make robot task execution safer, more reliable, and more robust to various types of uncertainty.


Enabling Digitalization in Modular Robotic Systems Integration

Tola, Daniella

arXiv.org Artificial Intelligence

Integrating robot systems into manufacturing lines is a time-consuming process. In the era of digitalization, the research and development of new technologies is crucial for improving integration processes. Numerous challenges, including the lack of standardization, as well as intricate stakeholder relationships, complicate the process of robotic systems integration. This process typically consists of acquisition, integration, and deployment of the robot systems. This thesis focuses on three areas that help automate and simplify robotic systems integration. In the first area, related to acquisition, a constraint-based configurator is demonstrated that resolves compatibility challenges between robot devices, and automates the configuration process. This reduces the risk of integrating incompatible devices and decreases the need for experts during the configuration phase. In the second area, related to integration, the interoperable modeling format, Unified Robot Description Format (URDF), is investigated, where a detailed analysis is performed, revealing significant inconsistencies and critical improvements. This format is widely used for kinematic modeling and 3D visualization of robots, and its models can be reused across simulation tools. Improving this format benefits a wide range of users, including robotics engineers, researchers, and students. In the third area, related to deployment, Digital Twins (DTs) for robot systems are explored, as these improve efficiency and reduce downtime. A comprehensive literature review of DTs is conducted, and a case study of modular robot systems is developed. This research can accelerate the adoption of DTs in the robotics industry. These insights and approaches improve the process of robotic systems integration, offering valuable contributions that future research can build upon, ultimately driving efficiency, and reducing costs.


Nonlinear vibration of a dipteran flight robot system with rotational geometric nonlinearity

Han, Yanwei, Zhang, Zijian

arXiv.org Artificial Intelligence

The dipteran flight mechanism of the insects is commonly used to design the nonlinear flight robot system. However, the dynamic response of the click mechanism of the nonlinear robot system with multiple stability still unclear. In this paper, a novel dipteran robot model with click mechanism proposed based on the multiple stability of snap-through buckling. The motion of equation of the nonlinear flight robot system is obtained by using the Euler-Lagrange equation. The nonlinear potential energy, the elastic force, equilibrium bifurcation, as well as equilibrium stability are investigated to show the multiple stability characteristics. The transient sets of bifurcation and persistent set of regions in the system parameter plane and the corresponding phase portraits are obtained with multiple stability of single and double well behaviors. Then, the periodic free vibration response are defined by the analytical solution of three kinds of elliptical functions, as well as the amplitude frequency responses are investigated by numerical integration. Based on the topological equivalent method, the chaotic thresholds of the homo-clinic orbits for the chaotic vibration of harmonic forced robot system are derived to show the chaotic parametric condition. Finally, the prototype of nonlinear flapping robot is manufactured and the experimental system is setup. The nonlinear static moment of force curves, periodic response and dynamic flight vibration of dipteran robot system are carried out. It is shown that the test results are agree well with the theoretical analysis and numerical simulation. Those result have the potential application for the structure design of the efficient flight robot.

  Country:
  Genre: Research Report (1.00)
  Industry: Health & Medicine (1.00)